Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Improvement of term frequency-inverse document frequency algorithm based on Document Triage
LI Zhenjun, ZHOU Zhurong
Journal of Computer Applications    2015, 35 (12): 3506-3510.   DOI: 10.11772/j.issn.1001-9081.2015.12.3506
Abstract505)      PDF (952KB)(412)       Save
The Term Frequency-Inverse Document Frequency (TF-IDF) algorithm does not consider the importance of index items themselves in the document when computing the weights of index terms. In order to solve the problem, the users' behaviors when reading were utilized to improve the efficiency of TF-IDF. By introducing Document Triage to TF-IDF, the Interest Profile Manager (IPM)was used to collect data about users' reading behaviors, and then the document scores were computed. Since the users' annotation was quite important in the aimed text, or reflected the users' interest. The improved term weighting algorithm named Document Triage-Term Frequency-Inverse Document Frequency (DT-TF-IDF) was proposed by introducing document scores and users' annotation to TF-IDF and giving a greater weight to annotated term. The experimental results show that the recall, the precision and their harmonic mean of DT-TF-IDF are all higher than those of the traditional TF-IDF algorithm. The proposed DT-TF-IDF algorithm is more effective than TF-IDF and has improved the accuracy of the text similarity calculation.
Reference | Related Articles | Metrics